6 Sure Independence Screening for Ultra - High Dimensional Feature Space ∗
نویسندگان
چکیده
High dimensionality is a growing feature in many areas of contemporary statistics. Variable selection is fundamental to high-dimensional statistical modeling. For problems of large or huge scale pn, computational cost and estimation accuracy are always two top concerns. In a seminal paper, Candes and Tao (2007) propose a minimum l1 estimator, the Dantzig selector, and show that it mimics the ideal risk within a logarithmic factor log pn. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high: the factor log pn can be large and their uniform uncertainty condition can fail. Motivated by these concerns, in this paper we introduce the concept of sure screening and propose a fast and straightforward method via iteratively thresholded ridge regression, called Sure Independence Screening (SIS), to reduce high dimensionality to a relatively large scale dn, say below sample size. An appealing special case of SIS is the componentwise regression. In a fairly general asymptotic framework, SIS is shown to possess the sure screening property for even exponentially growing dimensionality. With ultra-high dimensionality reduced accurately to below sample size, variable selection becomes much easier and can be accomplished by some refined lower-dimensional methods that have oracle properties. Depending on the scale of dn, one can use, for example, the Dantzig selector or Lasso, the fine method of SCAD-penalized least squares in Fan and Li (2001), or the adaptive Lasso in Zou (2006). Short title: Sure Independence Screening AMS 2000 subject classifications: Primary 62J99; secondary 62F12
منابع مشابه
Sure independence screening for ultrahigh dimensional feature space
High dimensionality is a growing feature in many areas of contemporary statistics. Variable selection is fundamental to high-dimensional statistical modeling. For problems of large or huge scale pn, computational cost and estimation accuracy are always two top concerns. In a seminal paper, Candes and Tao (2007) propose a minimum l1 estimator, the Dantzig selector, and show that it mimics the id...
متن کاملar X iv : m at h / 06 12 85 7 v 2 [ m at h . ST ] 2 7 A ug 2 00 8 Sure Independence Screening for Ultra - High Dimensional Feature Space ∗
August 27, 2008 Abstract Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularizati...
متن کاملSure Independence Screening
Big data is ubiquitous in various fields of sciences, engineering, medicine, social sciences, and humanities. It is often accompanied by a large number of variables and features. While adding much greater flexibility to modeling with enriched feature space, ultra-high dimensional data analysis poses fundamental challenges to scalable learning and inference with good statistical efficiency. Sure...
متن کاملDiscussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.
June 30, 2008 Abstract Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using L1 regularization...
متن کاملNonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.
A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008